{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Histograms"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is a common practice to create histograms to explore your data as it can give you a general idea of what your data looks like. A histogram is a summary of the variation in a measured variable. It shows the number of samples that occur in a category. A histogram is a type of frequency distribution.  \n",
    "\n",
    "Histograms work by binning the entire range of values into a series of intervals and then counting how many values fall into each interval. While the intervals are often of equal size, they are not required to be."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The ``inline`` flag will use the appropriate backend to make figures appear inline in the notebook.  \n",
    "%matplotlib inline\n",
    "\n",
    "import pandas as pd\n",
    "\n",
    "# `plt` is an alias for the `matplotlib.pyplot` module\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load Data\n",
    "\n",
    "The data we will use to demonstrate histograms is the House Sales in King County, USA dataset: https://www.kaggle.com/harlfoxem/housesalesprediction). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv('data/kingCountyHouseData.csv')\n",
    "\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Histograms using Pandas\n",
    "\n",
    "The goal of this particular visualization is to make a histogram on the `price` column. In doing this, you will see creating a data visualization can be an iterative process. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df['price'].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Using the default settings is not a good idea\n",
    "# Keep in mind that visualizations are an interative process.\n",
    "df['price'].hist()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# One solution is to rotate your xticklabels\n",
    "df['price'].hist()\n",
    "plt.xticks(rotation = 90)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# if you want a quick solution to make the xticklabels readable,\n",
    "# try changing the plot style \n",
    "plt.style.use('seaborn')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Change the number of bins\n",
    "# Seems better, but we still have empty space\n",
    "df['price'].hist(bins = 30)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# visualizing a subset of the data\n",
    "price_filter = df.loc[:, 'price'] <= 3000000\n",
    "df.loc[price_filter, 'price'].hist(bins = 30)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# you can also change the edgecolor and linewidth\n",
    "price_filter = df.loc[:, 'price'] <= 3000000\n",
    "\n",
    "# you can also change the edgecolor and linewidth\n",
    "df.loc[price_filter, 'price'].hist(bins = 30,\n",
    "                                   edgecolor='black')"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
